Unit 4: Data Cleanse Transforms 


e Modifying the phone file for other countries — By default, Data Cleanse includes phone 
number patterns for many countries. However, if you find that you need parsing for a 
country that is not included, you can modify the international phone file driphint.dat to 
enable Data Cleanse to detect phone number patterns that follow a different format. New 
phone number patterns can be added to the international phone file using regular 
expressions. 


e Using personal ID numbers — With a default Data Quality installation, Data Cleanse can 
identify USA Social Security numbers and separate them into discrete components. If your 
data includes personal identification numbers, which are different from US SSNs, you can 
use User-Defined Pattern Matching to identify the numbers. Number formats to be 
identified by User-Defined Pattern Matching can be set up using regular expressions. 


e Using cleansing packages — Cleansing Packages are packages that enhance the ability of 
Data Cleanse to accurately process various forms of global data by including language- 
specific reference data and parsing rules. Since cleansing packages are based on the 
standard Data Cleanse transform, you can use the sample transforms in your projects in 
the same way you would use Data Cleanse and gain the advantage of enhanced reference 
data and parsing rules. 


Cleansing Packages 


The Cleansing Package Builder 
What does Cleansing Package Builder do for me? 


Empowers the Data Steward / Subject Matter Expert to drive their own data cleansing 
solution. 


e Allows the business user to easily and quickly: 


Develop new cleansing solutions for data domains that SAP does not provide out-of-the 
box. For example, cleanse product data for domains such as Manufacturing or Financial 
industries. 


Customize the name/firm cleansing packages that SAP sells today. 


Provide direction on how their data should be classified on output, leveraging visual 
tools. 


e Cleansing Package Builder automatically creates the data dictionary, rules, and patterns 
that make up a cleansing package. 


e A Cleansing Package is used by the Data Services’ Data Cleanse transform during 
processing. 


The default Data Cleanse cleansing package consists of name and firm data that is not 
required for parsing operational data. If you use a complete custom dictionary, Data Cleanse 
does not have to consider all the default name and firm data, and parsing is faster and more 
accurate. 


Based on your analyzed data, you can create new output categories and fields where Data 
Cleanse can place parsed and standardized data. For data sets where the input data comes in 
only a few orders, using a single output category is sufficient. However, if your data can come 
in any order, you can use multiple output categories to reduce the number of rules needed. 


Cleansing Package-based classifications add meaning to terms when assigned to primary 
dictionary entries. For example, the primary entry blue might be assigned the classification 
color, meaning Data Cleanse can identify blue as a color term. 
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